10 research outputs found

    Leveraging Evolutionary Changes for Software Process Quality

    Full text link
    Real-world software applications must constantly evolve to remain relevant. This evolution occurs when developing new applications or adapting existing ones to meet new requirements, make corrections, or incorporate future functionality. Traditional methods of software quality control involve software quality models and continuous code inspection tools. These measures focus on directly assessing the quality of the software. However, there is a strong correlation and causation between the quality of the development process and the resulting software product. Therefore, improving the development process indirectly improves the software product, too. To achieve this, effective learning from past processes is necessary, often embraced through post mortem organizational learning. While qualitative evaluation of large artifacts is common, smaller quantitative changes captured by application lifecycle management are often overlooked. In addition to software metrics, these smaller changes can reveal complex phenomena related to project culture and management. Leveraging these changes can help detect and address such complex issues. Software evolution was previously measured by the size of changes, but the lack of consensus on a reliable and versatile quantification method prevents its use as a dependable metric. Different size classifications fail to reliably describe the nature of evolution. While application lifecycle management data is rich, identifying which artifacts can model detrimental managerial practices remains uncertain. Approaches such as simulation modeling, discrete events simulation, or Bayesian networks have only limited ability to exploit continuous-time process models of such phenomena. Even worse, the accessibility and mechanistic insight into such gray- or black-box models are typically very low. To address these challenges, we suggest leveraging objectively [...]Comment: Ph.D. Thesis without appended papers, 102 page

    Technical Reports Compilation: Detecting the Fire Drill anti-pattern using Source Code and issue-tracking data

    Full text link
    Detecting the presence of project management anti-patterns (AP) currently requires experts on the matter and is an expensive endeavor. Worse, experts may introduce their individual subjectivity or bias. Using the Fire Drill AP, we first introduce a novel way to translate descriptions into detectable AP that are comprised of arbitrary metrics and events such as logged time or maintenance activities, which are mined from the underlying source code or issue-tracking data, thus making the description objective as it becomes data-based. Secondly, we demonstrate a novel method to quantify and score the deviations of real-world projects to data-based AP descriptions. Using nine real-world projects that exhibit a Fire Drill to some degree, we show how to further enhance the translated AP. The ground truth in these projects was extracted from two individual experts and consensus was found between them. Our evaluation spans three kinds of pattern, where the first is purely derived from description, the second type is enhanced by data, and the third kind is derived from data only. The Fire Drill AP as translated from description only for either, source code- or issue-tracking-based detection, shows weak potential of confidently detecting the presence of the anti-pattern in a project. Enriching the AP with data from real-world projects significantly improves detection. Using patterns derived from data only leads to almost perfect correlations of the scores with the ground truth. Some APs share symptoms with the Fire Drill AP, and we conclude that the presence of similar patterns is most certainly detectable. Furthermore, any pattern that can be characteristically modeled using the proposed approach is potentially well detectable.Comment: 208 page

    Efficient Automatic Change Detection in Software Maintenance and Evolutionary Processes

    No full text
    Software maintenance is such an integral part of its evolutionary process that it consumes much of the total resources available. Some estimate the costs of maintenance to be up to 100 times the amount of developing a software. A software not maintained builds up technical debt, and not paying off that debt timely will eventually outweigh the value of the software, if no countermeasures are undertaken. A software must adapt to changes in its environment, or to new and changed requirements. It must further receive corrections for emerging faults and vulnerabilities. Constant maintenance can prepare a software for the accommodation of future changes. While there may be plenty of rationale for future changes, the reasons behind historical changes may not be accessible longer. Understanding change in software evolution provides valuable insights into, e.g., the quality of a project, or aspects of the underlying development process. These are worth exploiting, for, e.g., fault prediction, managing the composition of the development team, or for effort estimation models. The size of software is a metric often used in such models, yet it is not well-defined. In this thesis, we seek to establish a robust, versatile and computationally cheap metric, that quantifies the size of changes made during maintenance. We operationalize this new metric and exploit it for automated and efficient commit classification. Our results show that the density of a commit, that is, the ratio between its net- and gross-size, is a metric that can replace other, more expensive metrics in existing classification models. Models using this metric represent the current state of the art in automatic commit classification. The density provides a more fine-grained and detailed insight into the types of maintenance activities in a software project. Additional properties of commits, such as their relation or intermediate sojourn-times, have not been previously exploited for improved classification of changes. We reason about the potential of these, and suggest and implement dependent mixture- and Bayesian models that exploit joint conditional densities, models that each have their own trade-offs with regard to computational cost and complexity, and prediction accuracy. Such models can outperform well-established classifiers, such as Gradient Boosting Machines. All of our empirical evaluation comprise large datasets, software and experiments, all of which we have published alongside the results as open-access. We have reused, extended and created datasets, and released software packages for change detection and Bayesian models used for all of the studies conducted

    Temporal data analysis facilitating recognition of enhanced patterns

    No full text
    Assessing the source code quality of software objectively requires a well-defined model. Due to the distinct nature of each and every project, the definition of such a model is specific to the underlying type of paradigms used. A definer can pick metrics from standard norms to define measurements for qualitative assessment. Software projects develop over time and a wide variety of re-factorings is applied tothe code which makes the process temporal. In this thesis the temporal model was enhanced using methods known from financial markets and further evaluated using artificial neural networks with the goal of improving the prediction precision by learning from more detailed patterns. Subject to research was also if the combination of technical analysis and machine learning is viable and how to blend them. An in-depth selection of applicable instruments and algorithms and extensive experiments were run to approximate answers. It was found that enhanced patterns are of value for further processing by neural networks. Technical analysis however was not able to improve the results, although it is assumed that it can for an appropriately sizedproblem set

    Exploiting Relations, Sojourn-Times, and Joint Conditional Probabilities for Automated Commit Classification

    No full text
    The automatic classification of commits can be exploited for numerous applications, such as fault prediction, or determining maintenance activities. Additional properties, such as parent-child relations or sojourn-times between commits, were not previously considered for this task. However, such data cannot be leveraged well using traditional machine learning models, such as Random forests. Suitable models are, e.g., Conditional Random Fields or recurrent neural networks. We reason about the Markovian nature of the problem and propose models to address it. The first model is a generalized dependent mixture model, facilitating the Forward algorithm for 1st- and 2nd-order processes, using maximum likelihood estimation. We then propose a second, non-parametric model, that uses Bayesian segmentation and kernel density estimation, which can be effortlessly adapted to work with nth-order processes. Using an existing dataset with labeled commits as ground truth, we extend this dataset with relations between and sojourn-times of commits, by re-engineering the labeling rules first and meeting a high agreement between labelers. We show the strengths and weaknesses of either kind of model and demonstrate their ability to outperform the state-of-the-art in automated commit classification

    Metrics As Scores : A Tool- and Analysis Suite and Interactive Application for Exploring Context-Dependent Distributions

    No full text
    Metrics As Scores can be thought of as an interactive, multiple analysis of variance (abbr. "ANOVA," Chambers et al., 2017). An ANOVA might be used to estimate the goodness-of-fit of a statistical model. Beyond ANOVA, which is used to analyze the differences among hypothesized group means for a single quantity (feature), Metrics As Scores seeks to answer the question of whether a sample of a certain feature is more or less common across groups. This approach to data visualization and -exploration has been used previously (e.g., Jiang etal., 2022). Beyond this, Metrics As Scores can determine what might constitute a good/bad, acceptable/alarming, or common/extreme value, and how distant the sample is from that value, for each group. This is expressed in terms of a percentile (a standardized scale of [0, 1]), which we call score. Considering all available features among the existing groups furthermore allows the user to assess how different the groups are from each other, or whether they are indistinguishable from one another. The name Metrics As Scores was derived from its initial application: examining differences of software metrics across application domains (Hönel et al., 2022). A software metric is an aggregation of one or more raw features according to some well-defined standard, method, or calculation. In software processes, such aggregations are often counts of events or certain properties (Florac & Carleton, 1999). However, without the aggregation that is done in a quality model, raw data (samples) and software metrics are rarely of great value to analysts and decision-makers. This is because quality models are conceived to establish a connection between software metrics and certain quality goals (Kaner & Bond, 2004). It is, therefore, difficult to answer the question "is my metric value good?". With Metrics As Scores we present an approach that, given some ideal value, can transform any sample into a score, given a sample of sufficiently many relevant values. While such ideal values for software metrics were previously attempted to be derived from, e.g., experience or surveys (Benlarbi et al., 2000), benchmarks (Alves et al., 2010), or by setting practical values (Grady, 1992), with Metrics As Scores we suggest deriving ideal values additionally in non-parametric, statistical ways. To do so, data first needs to be captured in a relevant context (group). A feature value might be good in one context, while it is less so in another. Therefore, we suggest generalizing and contextualizing the approach taken by Ulan et al. (2021), in which a score is defined to always have a range of [0, 1] and linear behavior. This means that scores can now also be compared and that a fixed increment in any score is equally valuable among scores. This is not the case for raw features, otherwise. Metrics As Scores consists of a tool- and analysis suite and an interactive application that allows researchers to explore and understand differences in scores across groups. The operationalization of features as scores lies in gathering values that are context-specific (group-typical), determining an ideal value non-parametrically or by user preference, and then transforming the observed values into distances. Metrics As Scores enables this procedure by unifying the way of obtaining probability densities/masses and conducting appropriate statistical tests. More than 120 different parametric distributions (approx. 20 of which are discrete) are fitted through a common interface. Those distributions are part of the scipy package for the Python programming language, which Metrics As Scores makes extensive use of (Virtanen et al., 2020). While fitting continuous distributions is straightforward using maximum likelihood estimation, many discrete distributions have integral parameters. For these, Metrics As Scores solves a mixed-variable global optimization problem using a genetic algorithm in pymoo (Blank& Deb, 2020). Additionally to that, empirical distributions (continuous and discrete) and smooth approximate kernel density estimates are available. Applicable statistical tests for assessing the goodness-of-fit are automatically performed. These tests are used to select some best-fitting random variable in the interactive web application. As an application written in Python, Metrics As Scores is made available as a package that is installable using the PythonPackage Index (PyPI): pip install metrics-as-scores. As such, the application can be used in a stand-alone manner and does not require additional packages, such as a web server or third-party libraries

    Artifact: Quality Models Inside Out: Interactive Visualization of Software Metrics by Means of Joint Probabilities

    No full text
    <p>Abstract—Assessing software quality, in general, is hard; each metric has a different interpretation, scale, range of values, or measurement method. Combining these metrics automatically is especially difficult, because they measure different aspects of software quality, and creating a single global final quality score limits the evaluation of the specific quality aspects and trade-offs that exist when looking at different metrics. We present a way to visualize multiple aspects of software quality. In general, software quality can be decomposed hierarchically into characteristics, which can be assessed by various direct and indirect metrics. These characteristics are then combined and aggregated to assess the quality of the software system as a whole. We introduce an approach for quality assessment based on joint distributions of metrics values. Visualizations of these distributions allow users to explore and compare the quality metrics of software systems and their artifacts, and to detect patterns, correlations, and anomalies. Furthermore, it is possible to identify common properties and flaws, as our visualization approach provides rich interactions for visual queries to the quality models’ multivariate data. We evaluate our approach in two use cases based on: 30 real-world technical documentation projects with 20,000 XML documents, and an open source project written in Java with 1000 classes. Our results show that the proposed approach allows an analyst to detect possible causes of bad or good quality.</p
    corecore